model (GLM).
Don’t confuse the generalized linear model with the very similarly named general linear
model. It’s unfortunate that these two names are almost identical, because they describe two very
different things. Now, the general linear model is usually abbreviated LM, and the generalized
linear model is abbreviated GLM, so we will use those abbreviations. (However, some old
textbooks from the 1970s may use GLM to mean LM, because the generalized linear model had
not been invented yet.)
GLM is similar to LM in that the predictor variables usually appear in the model as the familiar linear
combination:
where the x’s are the predictor variables, and the c’s are the regression coefficients (with
being
called a constant term, or intercept).
But GLM extends the capabilities of LM in two important ways:
With LM, the outcome is assumed to be a continuous, normally distributed variable. But with
GLM, the outcome can be continuous or an integer. It can follow one of several different
distribution functions, such as normal, exponential, binomial (as in logistic regression), or Poisson.
With LM, the linear combination becomes the predicted value of the outcome, but with GLM, you
can specify a link function. The link function is a transformation that turns the linear combination
into the predicted value. As we note in Chapter 18, logistic regression applies exactly this kind of
transformation: Let’s call the linear combination V. In logistic regression, V is sent through the
logistic function
to convert it into a predicted probability of having the outcome
event. So if you select the correct link function, you can use GLM to perform logistic regression.
GLM is the Swiss army knife of regression. If you select the correct link function, you can use
it to do ordinary least-squares regression, logistic regression, Poisson regression, and a whole lot
more. Most statistical software offers a GLM function; that way, other specialized regressions
don’t need to be programmed. If the software you are using doesn’t offer logistic or Poisson
regression, check to see whether it offers GLM, and if it does, use that instead. (Flip to Chapter 4
for an introduction to statistical software.)
Running a Poisson regression
Suppose that you want to study the number of fatal highway accidents per year in a city. Table 19-1
shows some made-up fatal-accident data over the course of 12 years. Figure 19-1 shows a graph of
this data, created using the R statistical software package.